Periodic hierarchical load balancing for large supercomputers

نویسندگان

  • Gengbin Zheng
  • Abhinav Bhatele
  • Esteban Meneses
  • Laxmikant V. Kalé
چکیده

Large parallel machines with hundreds of thousands of processors are being built. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16, 384 cores of Ranger cluster (at TACC) and 65, 536 cores of a Blue Gene/P at Argonne National Laboratory for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation

Cluster and grid computing has made hierarchical and heterogeneous computing systems increasingly common as target environments for large-scale scientific computation. A cluster may consist of a network of multiprocessors. A grid computation may involve communication across slow interfaces. Modern supercomputers are often large clusters with hierarchical network structures. For maximum efficien...

متن کامل

A Load Balance Methodology for Highly Compute-Intensive Applications on Grids Based on Computational Modeling

An alternative to the use of traditional supercomputers in parallel compute-intensive applications. Pools of servers, storage systems and networks in a large virtual computer system. An optimal load balancing strategy is critical in a Grid environment. Avoid processing delays and overcommitment of resources. Take into account the different computational power of each node that changes dynamical...

متن کامل

Load Balancing for Parallel Computing on Distributed Computers

Distributed processing can be used for solving large computation intensive problems. A distributed system may include parallel supercomputers, networked workstations and PCs. This paper discusses load balancing of a parallel job in a distributed computation environment. The information necessary for load balancing is studied. The software tools that automatically collect the information and per...

متن کامل

Development and applications of a large scale fluids/structures simulation process on clusters

A modular process for efficiently solving large-scale multidisciplinary problems using single-image cluster supercomputers is presented. The process integrates disciplines with diverse physical characteristics while retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrat...

متن کامل

A Hierarchical Shared Memory Cluster Architecture with Load Balancing and Fault Tolerance

Recently a great deal of attention has been paid to the design of hierarchical shared memory cluster system. Cluster computing has made hierarchical computing systems increasingly common as target environment for large-scale scientific computations. This paper proposes hierarchical shared memory cluster architecture with load balancing and fault tolerance. Hierarchies of shared memory and cache...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJHPCA

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2011